AITopics | interpretability technique

Collaborating Authors

interpretability technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretability-Aware Pruning for Efficient Medical Image Analysis

Malik, Nikita, Seth, Pratinav, Singh, Neeraj Kumar, Chitroda, Chintan, Sankarapu, Vinay Kumar

arXiv.org Artificial IntelligenceSep-23-2025

Deep learning has driven significant advances in medical image analysis, yet its adoption in clinical practice remains constrained by the large size and lack of transparency in modern models. Advances in interpretability techniques such as DL-Backtrace, Layer-wise Relevance Propagation, and Integrated Gradients make it possible to assess the contribution of individual components within neural networks trained on medical imaging tasks. In this work, we introduce an interpretability-guided pruning framework that reduces model complexity while preserving both predictive performance and transparency. By selectively retaining only the most relevant parts of each layer, our method enables targeted compression that maintains clinically meaningful representations. Experiments across multiple medical image classification benchmarks demonstrate that this approach achieves high compression rates with minimal loss in accuracy, paving the way for lightweight, interpretable models suited for real-world deployment in healthcare settings.

artificial intelligence, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2507.0833

Country: Asia > India (0.14)

Genre: Research Report (0.65)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

Cywiński, Bartosz, Ryd, Emil, Rajamanoharan, Senthooran, Nanda, Neel

arXiv.org Artificial IntelligenceMay-21-2025

As language models become more powerful and sophisticated, it is crucial that they remain trustworthy and reliable. There is concerning preliminary evidence that models may attempt to deceive or keep secrets from their operators. To explore the ability of current techniques to elicit such hidden knowledge, we train a Taboo model: a language model that describes a specific secret word without explicitly stating it. Importantly, the secret word is not presented to the model in its training data or prompt. We then investigate methods to uncover this secret. First, we evaluate non-interpretability (black-box) approaches. Subsequently, we develop largely automated strategies based on mechanistic interpretability techniques, including logit lens and sparse autoencoders. Evaluation shows that both approaches are effective in eliciting the secret word in our proof-of-concept setting. Our findings highlight the promise of these approaches for eliciting hidden knowledge and suggest several promising avenues for future work, including testing and refining these methods on more complex model organisms. This work aims to be a step towards addressing the crucial problem of eliciting secret knowledge from language models, thereby contributing to their safe and reliable deployment.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.14352

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Enhanced Photonic Chip Design via Interpretable Machine Learning Techniques

Pira, Lirandë, Antony, Airin, Prathap, Nayanthara, Peace, Daniel, Romero, Jacquiline

arXiv.org Artificial IntelligenceMay-15-2025

Photonic chip design has seen significant advancements with the adoption of inverse design methodologies, offering flexibility and efficiency in optimizing device performance. However, the black-box nature of the optimization approaches, such as those used in inverse design in order to minimize a loss function or maximize coupling efficiency, poses challenges in understanding the outputs. This challenge is prevalent in machine learning-based optimization methods, which can suffer from the same lack of transparency. To this end, interpretability techniques address the opacity of optimization models. In this work, we apply interpretability techniques from machine learning, with the aim of gaining understanding of inverse design optimization used in designing photonic components, specifically two-mode multiplexers. We base our methodology on the widespread interpretability technique known as local interpretable model-agnostic explanations, or LIME. As a result, LIME-informed insights point us to more effective initial conditions, directly improving device performance. This demonstrates that interpretability methods can do more than explain models -- they can actively guide and enhance the inverse-designed photonic components. Our results demonstrate the ability of interpretable techniques to reveal underlying patterns in the inverse design process, leading to the development of better-performing components.

artificial intelligence, machine learning, optimization problem, (20 more...)

arXiv.org Artificial Intelligence

2505.09266

Country: Asia > Singapore (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Semiconductors & Electronics (0.61)
Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech

Mancini, Eleonora, Paissan, Francesco, Torroni, Paolo, Ravanelli, Mirco, Subakan, Cem

arXiv.org Artificial IntelligenceNov-13-2024

Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detection have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.

classifier, detection, explanation, (15 more...)

arXiv.org Artificial Intelligence

2411.08013

Country:

South America > Colombia > Antioquia Department > Medellín (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.91)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach

Gianchandani, Neha, Dibaji, Mahsa, Bento, Mariana, MacDonald, Ethan, Souza, Roberto

arXiv.org Artificial IntelligenceAug-23-2023

Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2308.12416

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.15)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Truthful Meta-Explanations for Local Interpretability of Machine Learning Models

Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios

arXiv.org Artificial IntelligenceDec-7-2022

Automated Machine Learning-based systems' integration into a wide range of tasks has expanded as a result of their performance and speed. Although there are numerous advantages to employing ML-based systems, if they are not interpretable, they should not be used in critical, high-risk applications where human lives are at risk. To address this issue, researchers and businesses have been focusing on finding ways to improve the interpretability of complex ML systems, and several such methods have been developed. Indeed, there are so many developed techniques that it is difficult for practitioners to choose the best among them for their applications, even when using evaluation metrics. As a result, the demand for a selection tool, a meta-explanation technique based on a high-quality evaluation metric, is apparent. In this paper, we present a local meta-explanation technique which builds on top of the truthfulness metric, which is a faithfulness-based metric. We demonstrate the effectiveness of both the technique and the metric by concretely defining all the concepts and through experimentation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.03513

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.93)
Law (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

An Attention Matrix for Every Decision: Faithfulness-based Arbitration Among Multiple Attention-Based Interpretations of Transformers in Text Classification

Mylonas, Nikolaos, Mollas, Ioannis, Tsoumakas, Grigorios

arXiv.org Artificial IntelligenceNov-28-2022

Transformers are widely used in natural language processing, where they consistently achieve state-of-the-art performance. This is mainly due to their attention-based architecture, which allows them to model rich linguistic relations between (sub)words. However, transformers are difficult to interpret. Being able to provide reasoning for its decisions is an important property for a model in domains where human lives are affected. With transformers finding wide use in such fields, the need for interpretability techniques tailored to them arises. We propose a new technique that selects the most faithful attention-based interpretation among the several ones that can be obtained by combining different head, layer and matrix operations. In addition, two variations are introduced towards (i) reducing the computational complexity, thus being faster and friendlier to the environment, and (ii) enhancing the performance in multi-label data. We further propose a new faithfulness metric that is more suitable for transformer models and exhibits high correlation with the area under the precision-recall curve based on ground truth rationales. We validate the utility of our contributions with a series of quantitative and qualitative experiments on seven datasets.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2209.10876

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.47)
Law > Alternative Dispute Resolution (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Dealing with Various Cancers using Machine Learning part2(AI Health Care Series)

#artificialintelligenceNov-8-2022, 19:50:17 GMT

Abstract: he paper proposes a novel hybrid discovery Radiomics framework that simultaneously integrates temporal and spatial features extracted from non-thin chest Computed Tomography (CT) slices to predict Lung Adenocarcinoma (LUAC) malignancy with minimum expert involvement. Lung cancer is the leading cause of mortality from cancer worldwide and has various histologic types, among which LUAC has recently been the most prevalent. LUACs are classified as pre-invasive, minimally invasive, and invasive adenocarcinomas. Timely and accurate knowledge of the lung nodules malignancy leads to a proper treatment plan and reduces the risk of unnecessary or late surgeries. Currently, chest CT scan is the primary imaging modality to assess and predict the invasiveness of LUACs.

ai health care series, interpretability technique, machine learning part2, (10 more...)

#artificialintelligence

Genre: Overview (0.74)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images

Chatterjee, Soumick, Saad, Fatima, Sarasaen, Chompunuch, Ghosh, Suhita, Krug, Valerie, Khatun, Rupali, Mishra, Rahul, Desai, Nirja, Radeva, Petia, Rose, Georg, Stober, Sebastian, Speck, Oliver, Nürnberger, Andreas

arXiv.org Artificial IntelligenceOct-15-2022

The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosis of infected patients. Medical imaging such as X-ray and Computed Tomography (CT) combined with the potential of Artificial Intelligence (AI) plays an essential role in supporting the medical staff in the diagnosis process. Thereby, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their Ensemble have been used in this paper to classify COVID-19, pneumoni{\ae} and healthy subjects using Chest X-Ray images. Multi-label classification was performed to predict multiple pathologies for each patient, if present. Foremost, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the Ensemble of the network models. The qualitative results depicted the ResNets to be the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making the decision regarding the best-performing model.

artificial intelligence, machine learning, pneumonia, (17 more...)

arXiv.org Artificial Intelligence

2006.0257

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Interactive Interpretability System for Breast Cancer Screening with Deep Learning

Lu, Yuzhe, Perer, Adam

arXiv.org Artificial IntelligenceSep-29-2022

Deep learning methods, in particular convolutional neural networks, have emerged as a powerful tool in medical image computing tasks. While these complex models provide excellent performance, their black-box nature may hinder real-world adoption in high-stakes decision-making. In this paper, we propose an interactive system to take advantage of state-of-the-art interpretability techniques to assist radiologists with breast cancer screening. Our system integrates a deep learning model into the radiologists' workflow and provides novel interactions to promote understanding of the model's decision-making process. Moreover, we demonstrate that our system can take advantage of user interactions progressively to provide finer-grained explainability reports with little labeling overhead. Due to the generic nature of the adopted interpretability technique, our system is domain-agnostic and can be used for many different medical image computing tasks, presenting a novel perspective on how we can leverage visual analytics to transform originally static interpretability techniques to augment human decision making and promote the adoption of medical AI.

artificial intelligence, machine learning, neuron, (17 more...)

arXiv.org Artificial Intelligence

2210.08979

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback